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Abstract 

In this article we consider Bayesian parameter inference associated to partially-observed stochastic processes 
that start from a set Bo and are stopped or killed at the first hitting time of a known set A. Such processes 
occur naturally within the context of a wide variety of applications. The associated posterior distributions 
are highly complex and posterior parameter inference requires the use of advanced Markov chain Monte Carlo 
(MCMC) techniques. Our approach uses a recently introduced simulation methodology, particle Markov chain 
Monte Carlo (PMCMC) [T], where sequential Monte Carlo (SMC) |18l |27| approximations are embedded within 
MCMC. However, when the parameter of interest is fixed, standard SMC algorithms are not always appropriate 
for many stopped processes. In [111 I15| the authors introduce SMC approximations of multi- level Feynman-Kac 
formulae, which can lead to more efficient algorithms. This is achieved by devising a sequence of nested sets 
from Bo to A and then perform the resampling step only when the samples of the process reach intermediate 
level sets in the sequence. Naturally, the choice of the intermediate level sets is critical to the performance of 
such a scheme. In this paper, we demonstrate that multi-level SMC algorithms can be used as a proposal in 
PMCMC. In addition, we propose a flexible strategy that adapts the level sets for different parameter proposals. 
Our methodology is illustrated on the coalescent model with migration. 
Key-Words: Stopped Processes, Sequential Monte Carlo, Markov chain Monte Carlo 

1 Introduction 

In this article we consider Markov processes that are stopped when reaching the boundary of a given set A. These 
processes appear in a wide range of applications, such as population genetics [24] [14], finance [7], neuroscience [5), 
physics |17l 122] and engineering [5J 125] . The vast majority of the papers in the literature deal with fully observed 
stopped processes and assume the parameters of the model are known. In this paper we address problems when this 
is not the case. In particular, Bayesian inference for the model parameters is considered, when the stopped process 
is observed indirectly via data. We will propose a generic simulation method that can cope with many types of 
partial observations. To the best of our knowledge, there is no previous work in this direction. An exception is [5], 
where maximum likelihood inference for the model parameters is investigated for the fully observed case. 

In the fully observed case, stopped processes have been studied predominantly in the area of rare event simulation. 
In order to estimate the probability of rare events related to stopped processes, one needs to efficiently sample 
realisations of a process that starts in a set Bq and terminates in the given rare target set A before returning to Bq 
or getting trapped in some absorbing set. This is usually achieved using Importance Sampling (IS) or multi- level 
splitting; see[H] [3D] and the references in those articles for an overview. Recently, sequential Monte Carlo (SMC) 
methods based on both these techniques have been used in [5J |TT] [22] . In [TU] the authors also prove under mild 
conditions that SMC can achieve same performance as popular competing methods based on traditional splitting. 

Sequential Monte Carlo methods can be described as a collection of techniques used to approximate a sequence 
of distributions whose densities are known point-wise up to a normalizing constant and are of increasing dimension. 
SMC methods combine importance sampling and resampling to approximate distributions. The idea is to introduce 
a sequence of proposal densities and to sequentially simulate a collection of N 3> 1 samples, termed particles, 
in parallel from these proposals. The success of SMC lies in incorporating a resampling operation to control the 
variance of the importance weights, whose value would otherwise increase exponentially as the target sequence 
progresses e.g. [T5I I27| . 
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Applying SMC in the context of fully observed stopped processes requires using resampling while taking into 
account how close a sample is to the target set. That is, it is possible that particles close to A are likely to have very 
small weights, whereas particles closer to the starting set Bq can have very high weights. As a result, the diversity 
of particles approximating longer paths before reaching A would be depleted by successive resampling steps. In 
population genetics, for the coalescent model [24], this has been noted as early as in the discussion of [31] by the 
authors of Later, in the authors used ideas from splitting and proposed to perform the resampling step 
only when each sample of the process reaches intermediate level sets, which define a sequence of nested sets from Bq 
to A. The same idea appeared in parallel in [151 Section 12.2], where it was formally interpreted as an interacting 
particle approximation of appropriate multi-level Feynman-Kac formulae. Naturally, the choice of the intermediate 
level sets is critical to the performance of such a scheme. That is, the levels should be set in a "direction" towards 
the set A and so that each level can be reached from the previous one with some reasonable probability Q35]. This 
is usually achieved heuristically using trial simulation runs. Also more systematic techniques exist: for cases where 
large deviations can be applied in [T3] the authors use optimal control and in [51 [5] the level sets are computed 
adaptively on the fly using the simulated paths of the process. 

The contribution of this paper is to address the issue of inferring the parameters of the law of the stopped Markov 
process, when the process itself is a latent process and is only partially observed via some given dataset. In the 
context of Bayesian inference one often needs to sample from the posterior density of the model parameters, which 
can be very complex. Employing standard Markov chain Monte Carlo (MCMC) methods is not feasible, given the 
difficulty one faces to sample trajectories of the stopped process. In addition, using SMC for sequential parameter 
inference has been notoriously difficult; see [5J[53]. In particular, due to the successive resampling steps the simulated 
past of the path of each particle will be very similar to each other. This has been a long standing bottleneck when 
static parameters 9 are estimated online using SMC methods by augmenting them with the latent state. These 
issues have motivated the recently introduced particle Markov chain Monte Carlo (PMCMC) [T|. Essentially, the 
method constructs a Markov chain on an extended state-space in the spirit of [5], such that one may apply SMC 
updates for a latent process, i.e. use SMC approximations within MCMC. In the context of parameter inference for 
stopped process this brings up the possibility of using the multi- level SMC methodology as a proposal in MCMC. 
This idea to the best of our knowledge has not appeared previously in the literature. The main contributions made 
in this article are as follows: 

• When the sequence of level sets is fixed a priori, the validity of using multi-level SMC within PMCMC is 



• To enhance performance we propose a flexible scheme where the level sets are adapted to the current parameter 
sample. The method is shown to produce unbiased samples from the target posterior density. In addition, we 
show both theoretically and via numerical examples how the mixing of the PMCMC algorithm is improved 
when this adaptive strategy is adopted. 

This article is structured as follows: in Section[2]we formulate the problem and present the coalescent as a motivating 
example. In Section[3]we present multi-level SMC for stopped processes. In Section|4]we detail a PMCMC algorithm 
which uses multi- level SMC approximations within MCMC. In addition, specific adaptive strategies for the levels are 
proposed, which are motivated by some theoretical results that link the convergence rate of the PMCMC algorithm 
to the properties of multi-level SMC approximations. In Section[5]some numerical experiments for the the coalescent 
are given. The paper is concluded in Section |6] The proofs of our theoretical results can be found in the appendix. 

1.1 Notations 

The following notations will be used. A measurable space is written as (E, £) , with the class of probability measures 
on E written 8P(E). For R™, neN the Borel sets are ^(W 1 ). For a probability measure 7 € 2? ''(E) we will denote 
the density with respect to an appropriate cr-finite measure dx as 7(1). The total variation distance between two 
probability measures 71, 72 £ 9 s (E) is written as H71 — 72H = sup^gg |7i(A) — 72 (A) |. For a vector (x%, . . . , Xj), the 
compact notation Xi-j is used; if i > j Xi-j is a null vector. For a vector x\-j, |xi : j|i is the Li— norm. The convention 
Y[q = 1 is adopted. Also, min{a, b} is denoted as a A b and Ia(x) is the indicator of a set A. Let E be a countable 
(possibly infinite dimensional) state-space, then 
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denotes the class of stochastic matrices which possess a stationary distribution. In addition, we will denote as 
Cj = (0, . . . , 0, 1, 0, . . . , 0) the (i-dimensional vector whose i th element is 1 and is everywhere else. Finally, for the 
discrete collection of integers we will use the notation = {1, . . . , d}. 

2 Problem Formulation 

2.1 Preliminaries 

Let 9 be a parameter vector on (0,^(0)), C M. de with an associated prior ng G ^(8). The stopped process 
{X t }t>o is a (E, £)— valued discrete-time Markov process defined on a probability space (f2, J?,Pg), where Pg is a 
probability measure defined for every 9 G such that for every A G JP, We (A) is BS{&)— measurable. For simplicity 
will we will assume throughout the paper that the Markov process is homogeneous. The state of the process {Xt}t>o 
begins its evolution in a non empty set Bq obeying an initial distribution vg : Bq — > &(Bo) and a Markov transition 
kernel Pg : E x 9 — > £?>(E). The process is killed once it reaches a non-empty target set A C Bq G & such that 
Pg (Xq G Bq \ A) = 1. The associated stopping time is defined as 

T = inf{i > : X t G A}, 

where it is assumed that Pg(T < oo) = 1 and T G I, where 1 is a collection of positive integer values related to 
possible stopping times. 

In this paper we assume that we have no direct access to the state of the process. Instead the evolution of the 
state of the process generates a random observations' vector, which we will denote as Y. The realisation of this 
observations' vector is denoted as y and assume that it takes value in some non empty set F . We will also assume 
that there is no restriction on A depending on the observed data y, but to simplify exposition this will be omitted 
from the notation. 

In the context of Bayesian inference we are interested in the posterior distribution: 

Tr(d9,dx 0:T ,T\y) oc jg(dx 0:T , y, T)ir(d6), (1) 

where t G X is the stopping time, -Kg is the prior distribution and jg is the un-normalised complete-data likelihood 
with the normalising constant of this quantity being: 

Ze = y~] -fg(dx : T ,y,T)dx : T . 

The subscript on 9 will be used throughout to explicitly denote the conditional dependance on the parameter 9. 
Given the specific structure of the stopped processes one may write jg as 

T 

7e(dx :r,y,T) = £,g(y\xo; T )l( A cy xA (xo: T )isg(dx )Y[Pe(dxt\xt-i), (2) 

t=i 

where £g : x F x (Ureil 7 "} x E T+1 ) — > (0, 1) is the likelihood of the data given the trajectory of the process. 
Throughout, it will be assumed that for any 9 G 0, y G F, 70 admits a density jg(xo: T ,y,T) with respect to a 
o — finite measure dx§-. T on E = (Ureii 1 "} x ^ T+1 ) an d the posterior and prior distributions tt, p admit densities 
W, p respectively both defined with respect to appropriate a— finite dominating measures. 

Note that ([T]) is expressed as an inference problem for (6, xq- t ,t) and not only 9. The overall motivation 
originates from being able to design a MCMC that can sample from 71", which requires one to write the target (or 
an unbiased estimate of it) up-to a normalizing constant [3]. Still, our primary interest lies in Bayesian inference 
for the parameter and this can be recovered by the marginals of 7r with respect to 9. As it will become clear in 
Section |4] the numerical overhead when augmenting the argument of the posterior is necessary and we believe that 
the marginals with respect to xo :r , t might also be useful by-products. 

2.2 Motivating example: the coalescent 

The framework presented so far is rather abstract, so we introduce the coalescent model as a motivating example. 
In Figure [T] we present a particular realisation of the coalescent for two genetic types {A, C}. The process starts at 
epoch t = when the most recent common ancestor (MRCA) splits into two versions of itself. In this example A is 
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Figure 1: Coalescent model example: each of {A, C} denote the possible genetic type of observed chromosomes. 
In this example we have d = 2 and to = 3. The tree propagates forward in time form the MRCA downwards by 
a sequence of split and mutation moves. Arrows denote a mutation of one type of a chromosome to another. The 
name of the process originates from viewing the tree backwards in time (from bottom to top) where the points 
where the graph join are coalescent events. 

chosen to be the MCRA and the process continues to evolve by split and mutation moves. At the stopping point 
(here t — 4) we observe some data y, which corresponds to the number of genes for each genetic type. 

In general we will assume there are d different genetic types. The latent state of the process x\ is composed of 
the number of genes of each type i at epoch t of the process and let also Xt = (x\, . . . , xf). The process begins by 
default when the first split occurs, so the Markov chain {X t } t>0 is initialised by the density 



ve{xa) 



Vi if xo = 2e, : 
otherwise. 



and is propagated using the following transition density: 



Pe(x t \x t -i) 



\xt 7 

kf - 
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if x t = x t -i 

if x t = xt-i 
otherwise, 



+ e/ (mutation) 
(split) 



where ei is defined in Section 11.11 Here the first transition type corresponds to individuals changing type and is 
called mutation, e.g. A — > C at t £ {1, 3} in Figure[TJ The second transition is called a split event, e.g. t £ {0, 2} in 
the example of Figure [T] To avoid any confusion we stress that in Figure [1] we present a particular realisation of the 
process that is composed by a sequence of alternate split and mutations, but this is not the only possible sequence. 
For example, the bottom of the tree could have be obtained with C being the possible MCRA and a sequence of 
two consecutive splits and a mutation. 

The process is stopped at epoch r when the number of individuals in the population reaches m. So for the state 
space we define: 



E = \J[{t}xE 



tex 



rt+i 



E 
X 



{x : x e (Z + ) d and 2 < |x|i < to} 
{to, to + 1, . . . }, 
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and for the initial and terminal sets we have: 



B a = {x : x e {0, 2} d and \x\t = 2} 
A = {x : x € (Z + ) d and = m}. 

The data is generated by setting y := y 1:d = x T (G A), which corresponds to the counts of genes that have been 
observed. In the example of Figure Q] this corresponds to m — 3. Hence for the complete likelihood we have: 



ve{xo)\\_Pe{xt\xt-i) 
t=i 



(3) 



As expected, the density is only non-zero if at time r x T matches the data y exactly. 

Our objective is to infer the genetic parameters 6 = (/i, i?), where [i G M. + and R G S(Td) and hence the 
parameter space can be written as = R + x S(T^). To facilitate Monte Carlo inference, one can reverse the 
time parameter and simulate backward from the data. This is now detailed in the context of importance sampling 
following the approach in |21| . 

2.2.1 Importance sampling for the coalescent model 

To sample realisations of the process for a given 9 G 0, importance sampling is adopted but with time reversed. 
First we introduce a time reversed Markov kernel Mg with density Mg(xt-x\xt). This is used as an importance 
sampling proposal where sampling is performed backwards in time and the weighting forward in time. We initialise 
using the data and simulate the coalescent tree backward in time until two individuals remain of the same type. 
This procedure ensures that the data is hit when the tree is considered forward in time. 

The process defined backward in time can be interpreted as a stopped Markov process with the definitions of 
the initial and terminal sets appropriately modified. For convenience we will consider the reverse event sequence 
of the previous section, i.e we posed the problem backwards in time with the reverse index being j. The proposal 
density for the full path starting from the bottom of the tree and stopping at its root can be written as 

Qeixa-.r) =^B n{x:x= y }(x )< Yi M e (xj \xj-i) n Bo (x T ). 

I. j=l > 

With reference to we have 

J {x O :r,y,T) - — — Ve{x T )l I I — / ^ >q g (x :r)- 

m-l + ^ ml { JLX Mg(x j \x j - 1 ) J 

Then the marginal likelihood can be obtained 

Ze - E / MxA ft ^p^ }uxo-,)dx .. r . 

m-l + n ml f^-JE^ V Mg(xj\xj-i) J 



In |31| the authors derive an optimal proposal Mg with respect to the variance of the marginal likelihood estimator. 
For the sake of brevity we omit any further details. In the current setup where there is only mutation and 
coalescences, the stopped-process can be integrated out [20|. but this is not typically possible in more complex 
scenarios. A more complicated problem, including migration, is presented in Section 15.21 Finally, we remark that 
the relevance of the marginal likelihood above will become clear later in Section U] as a crucial element in numerical 
algorithms for inferring 6. 



3 Multi-Level Sequential Monte Carlo Methods 

In this section we shall briefly introduce generic SMC without extensive details. We refer the reader for a more 
detailed description to [El [18]. To ease exposition, when presenting generic SMC, we shall drop the dependence 
upon parameter 9. 
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Algorithm 1 Generic SMC Algorithm 



Initialisation, n = 1: 
For i = 1,...,N 

1. Sample uf ~ Mi . 

2. Compute weights 



For n = 2, . . . ,p, 

For i = 1, ...,JV, 



1 M l{ uf) 1 EjLi^ 



1. Resampling: sample index d l n _ l ~ /(-|W n _i), where = . . . , W^_\). 

2. Sample u$ ~ ^n('l%?nli ) and set ti^ = ,«!''). 

3. Compute weights 



SMC algorithms are designed to simulate from a sequence of probability distributions m , iz%, . . . , n p defined on 
state space of increasing dimension, namely (Gi, (Gi x G2, Sfi <8> %),••• , (Gi x • • • x G p ,^i ® • • • ® Sfp)- Each 
distribution in the sequence is assumed to possess densities with respect to a common dominating measure: 

7r„(ltl:„) = 

with each un- normalised density being 7 f n : G\ x ■ • • x G n — > R+ and the normalizing constant being Z n . We will 
assume throughout the article that there are natural choices for {j n } and that we can evaluate each j n point- wise. 
In addition, we do not require knowledge of Z n . 

3.1 Generic SMC algorithm 

SMC algorithms approximate {7f n }f l=1 recursively by propagating a collection of properly weighted samples, called 
particles, using a combination of importance sampling and resampling steps. For the importance sampling part of 
the algorithm, at each step n of the algorithm, we will use general proposal kernels M n with densities M n , which 
possess normalizing constants that do not depend on the simulated paths. A typical SMC algorithm is given in 
Algorithm [T] and we obtain the following SMC approximations for ir n , 

N 

n»(du 1:n ) = Y,WWS u v(du 1:n ) 
3=1 

and for the normalizing constant Z n : 

fe=i ^ j=i > 

In this paper we will use / to be the multinomial distribution. Then the resampled index of the ancestor 
of particle i at time n, namely a l n _i £ {1, . . . , N}, is also a random variable with value chosen with probability 

W^-i^ . For each time n, we will denote the complete collection of ancestors obtained from the resampling step as 
a-n = (a,\ , . . . , ) and the randomly simulated values of the state obtained during sampling (step 2 for n > 2) as 
). We will also denote ai :p , u\- p the concatenated vector of all these variables obtained during 
the simulations from time n . — l,...,p. Note u\ :p the is a vector containing all N x p simulated states and should 
not be confused with the particle sample of the path {u^. p , . . . , u[ N J). 
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Furthermore, the joint density of all the sampled particles and the resampled indices is 

/ N \ p / N \ 

v(u, :p , aupi ) = n^^i ) n {\{wi:}{ i) M^\u:h i \---A al) ) ), (5) 

^ i=l ' n=2 ^ i=l ' 

The complete ancestral genealogy at each time can always traced back by defining an ancestry sequence b\. n for 

every i G Tn and n > 2. In particular, we set the elements of b\. n using the backward recursion 6* = a n n+1 
where bt = i. In this context one can view SMC approximations as random probability measures induced by the 
imputed random genealogy ai :n and all the possible simulated state sequences that can be obtained using u\- n . This 
interpretation of SMC approximations was introduced in |Tj and will be later used together with ip(ui :p , ai : p_i) for 
establishing the complex extended target distribution of PMCMC. 

3.2 Multi-Level SMC implementation 

For different classes of problems one can find a variety of enhanced SMC algorithms; see e.g. [18 . In the context 
of stopped processes, a multi-level SMC implementation was proposed in and the approach was illustrated for 
the coalescent model of Section |2~2"1 We consider a modified approach along the lines of Section 12.2 of [TS] which 
seems better suited for general stopped processes and can provably yield estimators of much lower variance relative 
to vanilla SMC. 

Introduce an arbitrary sequence of ^"—nested sets 

S D B x ■ ■ ■ D B p = A, p>2 

with the corresponding stopping times denoted as 

Ti = in£{t > : Xt G B{\, 1<1< P , 

Note that the Markov property of X t implies < 71 < T2 < ■ • • < T p = T ■ 

The implementation of multi-level SMC differs from the generic algorithm of Section 13.11 in that between suc- 
cessive resampling steps one proceeds by propagating in parallel trajectories of X^\ until the set B n is reached for 

(?) (i) 

each j & T^r. For a given j G IV the path Xq.£ is "frozen" once Xg. t G B n , until the remaining particles reach B n 
and then a resampling step is performed. More formally denote for n = 1 

X\ = (xo:t 1 ,ti) G {x 0:Tl ,n ■ x 0:Tl -i G B \Bi, x Tl g Si} 
where t\ is a realisation for the stopping time Ti and similarly for 2 < n < p we have 

(^T n _i-(-l:T n ) "T~7l) G \x Tn l J^\- Tn , T n . XT n -i + l:T n — l G B n —l \ B 7l7 x Tn G B n }. 

Multi- level SMC is a SMC algorithm which ultimately targets a sequence of distributions {ir n } each defined on 
a space 

En = (J {Tn} X E T - + 1 (6) 

r„GX„ 

where n G T p , p > 2 and X±, . . . ,I p are finite collections of positive integer values related to the stopping times 
71, • ■ • , T p respectively. In the spirit of generic SMC define intermediate target densities 7f n w.r.t to an appropriate 
a- finite dominating measure dX n . We will assume there exists a natural sequence of densities \jf n = -^}i< n <p 
obeying the restriction 7^ = 7 e so that the last target density 7 p coincides with 7^ in <j2j> . Note that we define a 
sequence of p target densities, but this time the dimension of 7 f n compared to 7 n _i grows with a random increment 
of r n — T n —\. In addition, 7 p should clearly depend on the value of 9, but this suppressed in the notation. The 
following proposition is a direct consequence of the Markov property: 

Proposition 3.1. Assume Pe(T < 00) = 1. Then the stochastic sequence defined {X n )i<n<p forms a Markov 
chain taking values in E n defined ([6]). In addition, for any bounded measurable function h : E n — > K, then 
hn H x p)lp(. dx v) = Erei/^+i h(xo: T ,T)ie(dxo; T ,y,T). 

The proof can be found in |151 Proposition 12.2.2, page 438], [T5l Proposition 12.2.4, page 444] and the second 
part is due to 7 = lg- 
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We will present multi-level SMC based as a particular implementation of the generic SMC algorithm. Firstly 
we replace u n ,u\- n with X n ,X\ :n respectively. Contrary to the presentation of Algorithm [1] for multi-level SMC 
we will use a homogeneous Markov importance sampling kernel Mg(dxt\xt-i), where Mg : 9 x E — > ^(E), 
Mg{dxo\x-x) = Mg(dxo) by convention and Mg is the corresponding density w.r.t. dx. To compute the importance 
sampling weights of step 3 for n > 2 in Algorithm [T] we use instead: 

W n {A.\, . . . , A. n ) 



7„_l(^l, • • • , #n-l) n^r„_ 1 + l Mg{xi\xi- X ) 

and for step 2 at n = 1: 



n[ioM»(i,in-i) 

To simplify notation from herein we write 

ri 

A<i(^i) = n 37 «( a: 'l a; '- 1 ) 

(=0 

and given p, for any 2 < n < p we have 

M„(A^|^_i)= H Mgix^xi^), 

i=T„_l+l 

where again we have suppressed the 6>-dependance of M. n in the notation. We present the multi-level SMC algorithm 
in Algorithm [2] Note here we include a procedure whereby at each stage n, particles that do not reach B n before 
time t n are rejected by assigning them a zero weight, whereas before it was hinted that resampling is performed 
when all particles reach B n . Similar to ([5]), it is clear that the joint probability density of all the random variables 
used to implement a multi-level SMC algorithm with multinomial resampling is given by: 

/ N \ p / N \ 

where X\ :p is defined similarly to u\- p . Finally, recall by construction Z p = Zg so the approximation of the 
normalizing constant of 70 for a fixed 9 is 

N N 

(8) 



v ( 1 1 

n=l 7 = 1 J 



3.2.1 Setting the levels 

We will begin by showing how the levels can be set for the coalescent example of Section 12.21 We will proceed in 
the spirit of Section 12.2.11 and consider the backward process so that the "time" indexing is set to start from the 
bottom of the tree towards the root. We introduce a a collection of integers m > h > h > ■ ■ ■ > lp = 2 and define 

B a = {xe (Z+ U {0}) d : x = y},n = 0, 

B n = {xE (Z+ U {0}) d : \x\ x = l n }, l<n<p. 

Clearly we have B n ^\ D B n , 1 < n < p and B p = A. One can also write the sequence of target densities for the 
multi-level setting as: 

7l(xo-. Tl ,n) = — ^- '-I {y} (x Q )T[Pg{xi^ 1 \xi)I {x eBn} (x tn ), 

m — 1 + (x rrv. 

ln( X 0:T n ,T n ) =7n-l( a; 0:-r n _ 1 ,r n _l) J| Pe(xi-1 \xi)^{x t „ 6B„} n=l,...,jJ. 

i=T„_!+l 



Algorithm 2 Multi-level SMC Algorithm 



Initialisation, n = 1: 
For i = 1, ...,JV 

1. For i = 1,.. 

(a) Sample xf ~ M e (-|x^i). 

(b) If xf G Si set rf 5 = i , Af x (<) = ( , ) ) and go to step 2. 



2. Compute weights 



For n = 2, . . . ,p, 

For i = 1,...,N, 



1. Resampling: sample index a^_ x ~ /(-|W n _i), where W n _i = (W^ l5 . . . , W^_\). 

2. For i = r^ 1 + l,...,i„: 

(a) Sample xf ] ~ Mfl(-|a£2i). 

(b) If G £?„ set = i , = (x^J^ +i (i)J T « 1 ^ an d g° to step 3. 

3. Set = (<^i :7 7-i >'^'i^)• 
4. Compute weights 
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Algorithm 3 Particle independent Metropolis-algorithm (PIMH) 



1. Sample X\-. p , ai :p _i from (J7J using the multi-level implementation of Algorithm [T] detailed in Section (|3.2p and 
compute Z p . Sample k ~ f{'\W p ) . 

2. Set ^(0) = (*(0),^i : p(0),ai.p_i(0)) = (*, X 1:p , ai ip _i) and Z p (0) = Z p . 

3. For i = 

(a) Propose a new X'i ip , a'i. p and fc' as in step 1 and compute Z' p , 

(b) Accept this as the new state of the chain with probability 1 A -^-^ — -. If we accept, set £(£) = 

(fc(i), <?i :p (i), a.i : p_i(i)) = (fc', ^{.p, a' 1 . p _ 1 ) and Z v (i) = Z' p , Otherwise reject, = £(i — 1) and 
Zp(£) = Z p (i - 1). 



The major design problem that remains in general is that given any candidates for {M n ,g}, how to set the spacing 
(in some sense) of the {B n } and how many levels are needed so that good SMC algorithms can be constructed. 
That is, if the {B n } are far apart, then one can expect that weights will degenerate very quickly and if the {B n } are 
too close that the algorithm will resample too often and hence lead to poor estimates. For instance, in the context 
of the coalescent example of Section l2~2l if one uses the above construction for {B n } the importance weight at the 
n-th resampling time is 

/ s tt Pe(xi-i\xi) 

Wn{XQ:rJ= = — -I{^„ S B„} [x Tn ), 

l=r„ 1+ 1 Me,n(xi\Xl^) 

Now, in general for any {l n }n=i an d P it is hard to know beforehand how much better (or not) the resulting multi- 
level algorithm will perform relative to a vanilla SMC algorithm. Whilst [11] show empirically that in most cases 
one should expect a considerable improvement, there 9 is considered to be fixed. In this case one could design the 
levels sensibly using offline heuristics or more advanced systematic methods using optimal control [T3] or adaptive 
simulation [8, 9J, e.g. by setting the next level using the median of a pre-specified rank of the particle sample. What 
we aim to establish in the next section is that when 9 is varies as in the context of MCMC algorithms, one can both 
construct PMCMC algorithms based on multi-level SMC and more importantly easily design for each 9 different 
sequences for {B n } based on similar ideas. 

4 Multi-Level Particle Markov Chain Monte Carlo 

Particle Markov Chain Monte Carlo (PMCMC) methods are MCMC algorithms, which use all the random variables 
generated by SMC approximations as proposals. As in standard MCMC the idea is to run an ergodic Markov chain 
to obtain samples from the distribution of interest. The difference lies that in order use the simulated variables 
from SMC, one defines a complex invariant distribution for the MCMC on an extended state space. This extended 
target is such that a marginal of this invariant distribution is the one of interest. 
This section aims on providing insight to the following questions: 

1. Is it valid in general to use multi-level SMC within PMCMC? 

2. Given that it is, how can we use the levels to improve the mixing of PMCMC? 

The answer to the first question seems rather obvious, so we will provide some standard but rather strong conditions 
for which multi-level PMCMC is valid. For the second question we will propose an extension to PMMH that adapts 
the level sets used to 9 at every iteration of PMCMC. [1] introduces three different and generic PMCMC algorithms: 
particle independent Metropolis Hastings algorithm (PIMH), particle marginal Metropolis Hastings (PMMH) and 
particle Gibbs samplers. In the remainder of the paper we will only focus on the first two of these. 

4.1 Particle independent Metropolis Hastings (PIMH) 

We will begin by presenting the simplest generic algorithm found in [T|, namely the particle independent Metropolis 
Hastings algorithm (PIMH) . In this case 9 and p are fixed and PIMH is designed to sample from the pre-specified 
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target distribution ir p also considered in Section 13.11 Although PIMH is not useful for parameter inference it is 
included for pedagogic purposes. One must bear in mind that PIMH is the most basic of all PMCMC algorithms. 
As such it is easier to analyse but still can provide useful intuition that can be used later in the context of PMMH 
and varying 9. 

PIMH is presented in Algorithm [3] It can be shown, using similar arguments to pQ, that the invariant density 
of the Markov kernel above is exactly (see the proof of Proposition 14. 2p 

where tp is as in (J7) and as before we have bp = k and 6^ = a n " +1 for every k,n . Note that Wp admits the target 
density of interest, tt p as the marginal, when k and ai :p _i are integrated out. 

We commence by briefly investigating some convergence properties of PIMH with multi- level SMC. Even though 
the scope of PIMH is not parameter inference, one can use insight on what properties are desired by multi-level 
SMC for PIMH when designing other PMCMC algorithms used for parameter inference. We begin with posing the 
following mixing and regularity assumption: 

(Al) For every 9 £ and p £ I there exist a <p £ (0, 1) such that for every (x, x') £ E x E: 

ip < M e (x'|x) < ip- 1 
There exist a p £ (0, 1) such that for 1 < n < p and every X\ :n £ E n : 

The stopping times are finite, that is for 1 < n < p there exist a f n < oo such that 

Assumption (A2J is rather strong, but are often used in the analysis of these kind of algorithms [U [15] because 
they simplify the proofs to a large extent. Recall that 9 £ Q and p £ I are fixed. We proceed by stating the 
following proposition: 

Proposition 4.1. Assume (J^Tj). Then for N > 1 Algorithm^ generates a sequence {Xi-.p(i)) i>0 that for any i > 1, 
f (0) e T ( ^ 1)N+1 xE,9£Q satisfies: 



\\£aw(X 1:p (i) £ .|f(0)) - tt p (-)|| <U-Zp ((p^) 2 ^ ^) 



The proof can be found in the appendix. The following remarks are generalised and do not always hold, but 
provide some intuition for the ideas that follow. The result shows intrinsically that as the supremum of the sum of 
the stopping times with respect to {i? n }^ =1 gets smaller, so does the convergence rate increase. This can be also 
linked to the variance of the estimator of Z p , which is well known to increase linearly with p |15l Theorem 12.2.2, 
pages 451-453]. Shorter stopping times will typically yield lower variance and hence better MCMC convergence 
properties. On the other hand often 7 P will be larger for longer p and longer stopping times (Proposition 14.11 is 
derived for a fixed p). In addition, sampling a stopped process is easier using a higher number of levels. In summary, 
the tradeoff is that although it is more convenient to use more auxiliary variables for simulating the process, these 
will slow down the mixing of PMCMC. In practice one balances this by trying to use a moderate number of levels 
for which most the particles to reach A. This tradeoff serves as a motivation for developing flexible schemes to vary 
p, {B n }P l=1 with 9 in the PMCMC algorithm presented later in Section |4~3"1 

4.2 Particle marginal Metropolis Hastings (PMMH) 

In the remainder of this section we will focus on using a multi-level SMC implementation within a PMMH algorithm. 
Given the commentary in Section [3. 2 1 and our interest in drawing inference on 9 £ O, it seems that using multi- level 
SMC within PMCMC should be highly beneficial. Recall ((TJ) can be expressed in terms of densities as: 

w(e,x 1:p )cx%(x 1:p )p(e) (9) 
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Algorithm 4 Particle marginal Metropolis Hastings using multi-Level SMC. 



1. Sample 0(0) ~ p(-). Given 9(0) sample A , i- p (0),ai. p _i(0) using multi-level SMC and compute Zgr y Sample 
k~f(-\W p ) . 

2. Set £(0) = (0(O),fc(O),* 1:p (O),a 1:p _i(O)) and Z e (0) = Z e{0) . 

3. For i = l,...,K: 

(a) Sample 6' ~ q(-\0(i — 1)); given 9' propose a new X'i-. p , a' 1 . p _ 1 and k' as in step 1 and compute Z'g,. 

(b) Accept this as the new state of the chain with probability 

Z>,p(e') y q(9(i-W) 
Zg(i-l)p(6) q(9'\9(i-l))' 

If we accept, set £(i) = (6>(£), k(i), X\. p (i), ai :p _i(i)) = (0', fc', ^{.p, : ) and Ze(i) = Z' e ,. Otherwise 
reject, £(i) = £(i — 1) and Z$(i) = Zg(i — 1). 



and let the marginal density given by 

*(.9) = y2 Tr(9 : x :r,T\y)dx 0:T . 
rei jET+1 

For the time being we will consider the case when p is fixed. In the context of our stopped Markov process, we 
propose a PMMH algorithm targeting if (9, Xi :p ) in Algorithm 0] 

We will establish the invariant density and convergence of this algorithm, under the following assumption: 

(A2) For any 9 € O and pglwe define the following sets for n = 1, . . . , p: S„ = {X\. n £ E n : j„(Xi :n ) > 0} and 
Q e n = {X Un € ~E n : r in-x{Xx:n-i)M$, n {X n \X n -i) > 0}. For any 9 e O we have that C Q e n . In addition 
the ideal Metropolis Hastings targeting if (9) using proposal density q(9'\9) is irreducible and aperiodic. 

This assumption contains Assumptions 5 and 6 of [T] modified to our problem with a simple change of notations. 
We proceed with the following result: 

Proposition 4.2. Assume then for any N > 1; 

1. The invariant density of the procedure described in Algorithm^ is on the space x 1 ) Ar+1 x E n and has 
the representation 



^(9^X 1 . p M- P -x) = ^1^1 



where it is as in @ and tpg as in ([7]). In addition, (|10p admits ir(9) as a marginal. 
2. Algorithm^ generates a sequence (9(i), Xi [p (i)) ,>o such that 

lim \\Caw(9(i),X 1:p (i) e •) — tt(-)|| =0 

i— >oo 

where tt is as in ([lj . 

The proof of the result is in the Appendix. The result is based on Theorem 4 of [1] . Note that Algorithm |4] 
presented in a generic form of a "vanilla" PMMH algorithm, so it can be enhanced using various strategies. For 
example, it is possible to add block updating of the latent variables or backward simulation in the context of a 
particle Gibbs version [33J . In the next section, we propose a flexible scheme that allows to set a different number 
of levels after a new 9' is proposed. 
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Algorithm 5 Particle marginal Metropolis Hastings using multi-Level SMC with adaptive level sets. 



1. Sample 9(0) ~ p(-). Given 9(0): sample v(0) from Ag( ), then <-fi :p („(o))(0), a 1: p(„(o))-i(0) using multi-level 
SMC and compute Zg( y Sample k ~ f(-\W p ) . 

2. Set £(0) = (fl(0) J w(0) ) fc(0),^i. p (0),ai.p_i(0)) and Zg(0) = Z e{0) . 

3. For i = l,...,K: 

(a) Sample 9' ~ <z(-|#(* — 1)); sample u' from Ag> and X'i : p(yi^, ax-p(«')— l' ^' as ^ n s ^ e P ^ anc ^ compute Z' g ,. 

(b) Accept this as the new state of the chain with probability 

1A %m y q(6(*~W) 
Zg(i-\)p(9) q(9'\6(i-l)Y 

If we accept, set = (i), ^^(^(i))-! (i)) = (o\v\k\Xl p ^,ya! Up{v/) _^ and 

Zg(i) = Z' g ,. Otherwise reject, £(£) = £(i — 1) and Zg(i) = Zg(i — 1). 



4.3 Adapting the level sets 

The remaining design issue for PMMH is how to tune multi-level SMC by choosing p and {B n }^ =1 . Whilst, for 
a fixed S £ 6, one could solve the problem with preliminary runs, when 9 varies this is not an option. In general 
the value of 9 should dictate how small or large p should be to facilitate an efficient SMC algorithm. Hence, to 
obtain a more accurate estimate of the marginal likelihood and thus an efficient MCMC algorithm, we need to 
consider adaptive strategies to propose randomly a different number of levels p and levels' sequence {B n }^ =1 for 
each 9(i) sampled at every PMMH iteration i. To ease exposition we will assume that p, {B n }^ =1 can be expressed 
as functions of an arbitrary auxiliary parameter v. 

Given 9(i) is a random variable, the main questions we wish to address is how to perform such an adaptive 
strategy consistently. An important point, is the fact that since we are interested in parameter inference, it is 
required that the marginal of the PMMH invariant density is Tf (6). This can be ensured (see Proposition 14. 3[) by 
introducing at each PMMH iteration, the parameters that form the level sets v(i) as an auxiliary process, which 
given 9(i) is conditionally independent of k(i), X\. p (i), ai :p _i(i). This way we define an extended target for the 
MCMC algorithm, which includes p and {B n }^ =1 in the target variables. It should be noted that this scheme is 
explicitly different from Proposition 1 of [28], where the MCMC transition kernel at iteration i is dependent upon 
an auxiliary process. Here one just augments the target space with more auxiliary variables. 

Consider now that it is possible at every PMMH iteration i to simulate the auxiliary process v defined upon an 
abstract state-space (V,^). Let this with associated random variable v, be distributed according to Ag, which is 
assumed to possess a density with respect to a. cr— finite measure dv written as Ag. As hinted by the notation A# 
should depend on 9 and v is meant be used to determine the sequence of levels {B„} p n=1 for each 9(i) in PMMH. 
This auxiliary variable will induce for every 9 g O: 

• a random number of level sets p(v) e J C Z+. 

• a sequence of level sets {-Bn (tO }n=i with B p t v \ = A . 

We will assume that for any 9 € O, Proposition 13.11 and ([5)1 will hold Ag— almost everywhere, where this time p 
should be replaced by p(v). This implies that for every 9 € O we have: 

S~2 le(Z0:T pM ,y,T p (y))dx :T pM = V" / le( x 0:T, V, T)dx 0:T , (11) 

where the expression holds Ag— almost everywhere. In Algorithm [5] we propose a PMMH algorithm, which at each 
step i uses 9(i) to adapt the levels {B n (v(i))}„2i ■ For Algorithm [5] we present the following proposition that 
verifies varying the level sets in this way is theoretically valid: 

Proposition 4.3. Assume an d (fTTj) hold. Then, for any N > 1; 
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1. The invariant density of the procedure in Algorithm^ is defined on the space 



and has the representation 





where tt is as in ([9]) and %j)g is as in |7j). In addition, ((12)) admits tt(9) as a marginal. 
2. The generated sequence (0(i), Xi: P (v(i))(i)) i>o satisfies 

lim \\£aw(6(i),X Up / v u))(i) e ■) -'"'(•) II =0 




)} 



(12) 



where 7r is as in ([!}. 

The proof is contained in the Appendix. We are essentially using an auxiliary framework similar to [3]. As in 
([T]) we included xq- t ,t in the target posterior, when we were primarily interested in 9, this time we augment the 
target posterior with v and the SMC variables X 1 . p ( v < j ,a. 1 . p ( v < ) _ 1 , which is a consequence of using PMCMC. The 
disadvantage is that as the space of the posterior increases it is expected that the mixing of the algorithm will be 
slower. This could be improved if we have opted xq :t ,t and v to be dependent on each other given 9, but this 
would need additional assumptions for the structure of 79. In addition, in many applications the parameters v that 
determine {_B n }^ =1 appear naturally and v often is low dimensional. Also, in most applications it might seem easier 
to find intuition on how to construct and tune Ag than computing the level sets directly from 9. For example, for 
the coalescent model of Section 12.21 with the mutation matrix R is fixed, one can envisage for a larger value of \x 
coalescent events are less likely and more level sets closer together are needed compared to smaller values of \x. 

5 Numerical Examples 

We will illustrate the performance of PMMH using numerical examples on two models from population genetics. 
The first one deals with the coalescent model of Section 12.21 when a low dimensional dataset is observed. This is 
meant as an academic/toy example suitable for comparing different PMMH implementations. The second example 
is a more realistic application and deals with a coalescent model that allows migration of individual genetic types 
from one sub-group to another [4"Ill4j. In both cases we will illustrate the performance of PMMH implemented with 
a simple intuitive strategy for adapting the level sets. 

5.1 The coalescent model 

We will use a known stochastic matrix R with all entries equal to 1/d. In this example d = 4 with and the dataset 
is y — (10,5,9,5). The parameter-space is set as = [0, 1.5] and a uniform prior will be used. For Mg we will 
use the optimal proposal distributions provided by [31) . The PMMH proposal q(-\fJ,(i — 1)) in Algorithm O is a log 
normal random walk, i.e. we use C = C(* — 1) + 0.4/V(0, 1) with £ = log(/x). 

We will compare PMMH when implemented with a simple adaptive scheme for p and when p is fixed. In the 
latter case we set p = 14. When an adaptive strategy is employed we will sample each time p directly using a 
multinomial distribution defined on {8, . . . , 28} with weights proportional to [jP. In both cases given p we place the 
levels almost equally spaced apart. 

5.1.1 Numerical results 

The adaptive and normal versions were run with N = 50, 100, 200 for 10 5 iterations. In each case the algorithm 
took approximately 2.5, 5, 10 hours to complete when implemented in Matlab and run on a Linux workstation 
using a Intel Core 2 Quad Q9550 CPU at 2.83 GHz. The results are shown in Figure [5] and [3J We observed that 
when we varied the number of levels, this allowed the sampler to traverse through a bigger part of the state space 
compared to when a fixed number of levels is used. As a result the estimated pdf of the adaptive case manages to 
include a second mode that is not seen in the non adaptive case. In the fixed levels case we see a clear improvement 
with increasing N, although the difference in the mixing between N = 100 and 200 is marginal. In the adaptive 
case the sampler performed well even with lower values of N. 
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Figure 2: PMMH for the coalescent without adaptation. A fixed number of 14 level sets is used. Left: estimated 
pdf of // for N = 50, 100, 200. Centre: the trace plot for N = 100. Right: autocorrelation plots for N = 50, 100, 200. 
The average acceptance ratio was 0.07, 0.08 and 0.10 respectively. 




Figure 3: PMMH for the coalescent adaptation. The number of levels sampled is proportional to Far left: 
estimated pdf of /i for N = 50, 100, 200. Central left: the trace plot for N = 100. Central right: histogram of 
number of levels in the posterior for N = 100. Far right: autocorrelation function plots for N = 50, 100, 200. The 
average acceptance ratio was 0.10, 0.11 and 0.13 respectively. 

5.2 The coalescent model with migration 

The model is similar to the one as described in Section |2~2"1 The major difference is that this time genetic types are 
of classified into sub-groups within which most activity happens. In addition, individuals are allowed to migrate 
from one group to another. We commence with a brief description of the model and refer the interested reader to 
[4l [14] for more details. As in Section [2~2l we will consider the process forward in time. Let g be the number of 
groups and the state at time t be composed as the concatenation of g groups of different genetic types as: 

%t = 0^1,4) ■ • ■ i x l,ti ■ • ■ i x g.t^ • •• i x g,t) 

The process under-goes split, mutation and migration transitions as follows: 

Xj = Xj_i + e ai 

Xj Xj — \ -I- C a ^l 

Xj = Xj-i — e a j + epj, 

where a, (3 G {1, ■ ■ ■ ,g} with a ^ /3 and e a ^ is a vector with a zero in every element except the (a — l)g + i -th 
one. Similarly to the simpler model of Section 12.21 the transition probabilities are parameterised by the mutation 
parameter /i, mutation matrix R and a migration matrix G. The latter is a symmetric matrix with zero values on 
the diagonal and positive values on the off-diagonals. Finally the data is generated when at time r the number of 
individuals in the population reaches m, and y = y 1:gd = x T . 

As for the model described in Section 12.21 one can reverse time and employ an backward sampling forward 
weighting importance sampling method; see |14| for the particular implementation details. In our example we 
generated data with m = 100, d = 64 and g = 3. This is quite a challenging set-up. As in the previous example 
we set the mutation matrix R to be known and uniform and we will concentrate on inferring the 6 = (p,G). 
Independent gamma priors with shape and scale parameters equal to 1 were adopted for each of the parameters. 
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5.2.1 Numerical results 



We implemented PMMH using N = 50, 100, 200 and a simple adaptive scheme for p. We allow p £ {10, 20, 33} 
and use approximately equal spacing between the levels. We choose each p with probability proportional to 
piogin+J^^j p r0 p 0sa j s f or t ne parameters were Gaussian random-walks on the log-scale. The algo- 

rithm was implemented in C/C++ was run for 10 5 iterations, which took approximately 3, 6 and 12 hours to 
complete. Whilst the run-time is quite long it can be improved by at least one order of magnitude if the SMC is 
implemented on Graphical Processing Units (GPU) as in |25) . 

For the dataset plotted in Figure @] (left) the results are plotted in Figures @] (right) and [3] The auto-correlation 
and trace plots indicate that the sampler mixes reasonably well for every N. These results in this example are 
encouraging as to the best of our knowledge Bayesian inference has not been attempted for this class of problems. 
We expect that practitioners with insight in the field of population genetics can come with more sophisticated 
MCMC proposals or adaptive schemes for the level sets, so that the methodology can be extended to realistic 
applications. 




Figure 4: Left: Dataset for the Coalescent with Migration. Right: Histogram of number of levels p in the resulting 
posterior for N = 100. 



6 Discussion 

In this article we have presented a multi-level PMCMC algorithm which allows one to perform Bayesian inference 
for the parameters of a latent stopped processes. In terms of methodology the main novelty of the approach is that 
uses auxiliary variables to adaptively compute the level sets with 6. The general structure of this auxiliary variable 
allows it to incorporate the use of independent SMC runs with less particles to set the levels. In the numerical 
examples we demonstrated that the addition auxiliary variables slow down the convergence of PMCMC, but this 
seemed a reasonable compromise in terms of performance compared when fixed number of level sets were used. 
The proposed algorithm requires considerable amount of computation, but to the authors best knowledge for such 
problems there seems to be a lack of alternative approaches. Also, recent developments GPU hardware can be 
adopted to speed up the computations even by orders of magnitude as in |25) . 

There are several extensions to the work here, which may be considered. Firstly, the scheme that is used to 
adapt the level sets relies mainly on intuition. We found simple adaptive implementations to work well in practice. 
In the rare events literature one may find more systematic techniques to design the level sets, based upon optimal 
control |13| or simulation j9|. Although these methods are not examined here, they can be characterised using 
alternative auxiliary variables similar to the ones in Proposition 14.31 so the auxiliary variable framework we use 
is quite generic. In addition, we emphasise that within a PMCMC framework one may also include multi-level 
splitting algorithms instead of SMC, which might appeal practitioners familiar with multi-level splitting. 

Secondly, one could seek to use these ideas within a SMC sampler framework of [16] as done in [12]. As noted 
in the latter article, a sequential formulation can improve the sampling scheme, sometimes at a computational 
complexity that is the same as the original PMCMC algorithm. In addition, this article focuses on the PMMH 
algorithm, so clearly extensions using particle Gibbs and block updates might prove valuable for many applications. 

Finally, from a modelling perspective, it may be of interest to apply our methodology in the context of hidden 
Markov models. In this context, one has 

T 

£{y\xo:r) = n^^kO 

i=0 
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Figure 5: PMMH for for the Coalescent with Migration for TV = 50, 100,200. Top row: estimated pdfs for /it, G12, 
G13, G23 (from left to right). Middle: trace plots for TV = 100. Bottom: autocorrelation function plots. The 
acceptance rate was 0.34, 0.37, 0.4 respectively. 



with gg(-\x) being the conditional likelihood of the observations. It would be important to understand, given a range 
of real applications, the feasibility of statistical inference, combined with the development of our methodology. An 
investigation of the effectiveness of such a scheme when applied to queuing networks is currently underway. 
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Appendix 

Proof. [Proof of Proposition 14. 1| The result is a straight forward application of Theorem 6 of [53] which adapted 
to our notation states: 



\£aw(x 1:p (i) e ~M-)\\ <K» 



1 - 



1 A 



gg(g) 



AE. 



'ipe 



1 A 



4(H) 



z P (0 



where the conditional expectation is the expectation w.r.t. the SMC algorithm (i.e. 5 ~ ipo) and the outer expec- 
tation is w.r.t. the PIMH target (i.e. £ ~ ir^). We also denote the estimate of the normalizing constant as Z p {-) 
with • denoting which random variables generate the estimate. 
Now, clearly via (AQJ 



w n (X .. Tn ) < 



< 



- T n -\-r n 



pip 
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with the convention that To = 0. Thus, it follows that 

p ^ N p r 



n— 1 j — 1 n— 1 



l 



< 



and we obtain: 



^(0 



Note that by assumption Z p (S) (pip) Zjti=1 n < 1 and thus we have 

||£ a «;(* 1:p (i) G -|£(0)) - M-)\\ < f 1 - E^[2p(S) (w) 2 ^ 1 ? " 



Given [T51 Theorem 7.4.2. Equation (7.17), page 239] and the fact that jg is defined to be strictly positive in (-AHJ 
we have that the SMC approximation Z p (-) is an unbiased estimate of the normalizing constant Z p 



[iJp(3)] — Z p , 



and we can easily conclude. 



(13) 
□ 



Proof. [Proof of Proposition 14. 2| The proof of parts 1. and 2. follows the line of arguments used in Theorem 4 of 
PQ, which we will adapt to our set-up. The main difference lies in the multi- level construction and second statement 
regarding the marginal of n N . For the validity of the multi-level set-up we will rely on Proposition 13. 11 

Suppose we design a Metropolis Hastings kernel with invariant density 7?^ and use a proposal q N (6, k, Xi :p , ai :p _i) 

MXi--p,*kp-i)f(k\W p )q(P(i - 1)|0') = MXiv,*i--P-i)W P :k) q(0\0') • Then 



,k,Xi: P ,a.up-i) 
q N (e,k,Xu P ,a 1:p ^ 1 ) 



N-P%(X^)p(0)/Z P 



%(4S) (nLi n- 1 (Ef =1 w n (x^)))p(e) 



V) 



Z q(0\0')' 

where we denote the normalising constant of the posterior in Jl} as: 

Z p p{0)d0 

Therefore the Metropolis-Hastings procedure to sample from Ttp will be as in Algorithm 2J 
Alternatively using similar arguments one we may write 



TTp(0, k, Xl-.p, ai:p_l) 



-^ e (^ 1:p ,ai:p_i)W* 



Summing over k and using the unbiased property of the SMC algorithm in Equation () 13[) it follows that 7f p (■) 
admits tt(9) as a marginal, so the proof of part 1. is complete. 

Part 2. is a direct consequence of Theorem 1 in [3] and Assumption (1^2$). 

□ 

Proof. [Proof of Proposition 14. 3| The proof is the similar as that of Proposition 14.21 For the proof of the first 
statement of part 1. one repeats the same arguments as for Proposition 14.21 with difference being in the inclusion 
of Ag(v) for n N and q N . For the second statement, to get the marginal of tF^, re- write the target as: 



n N (0, k, v, ^i :p ( 1) ),a 1:p („)_i) 



J p(v) 



:p(v) 7 *^-l:p(u) — 1 
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Let TTp (6) denote the marginal of tt^ (•) obtained in Proposition 14.21 Using (fTTj) and the conditional independence 
of v and <?i:p(„), BLi-.pM— 1 ; then for the marginal of 7r Ar (-) w.r.t v, Xi :p t v \, &-Up(v)-i> k we have that 



where the summing over k and integrating w.r.t. A?i.p( v ),ai :p ( v )_i is as in Proposition 

(A.:) 

For part 2. note that the conditional density given k and v and 9 of is 

^,^,)A, W 

n(6)A s {v) V ; 

Hence the sequence (o(i), X^^ (i)j satisfies the required property as direct consequence Theorem 1 in |3] and 
Assumption (42). □ 
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